09. Investigating the Data

Investigating the Data

Question:

Investigating the data

Now you've started the data wrangling process by loading the data and making sure it's in a good format. The next step is to investigate a bit and see if there are any inconsistencies or problems in the data that you'll need to clean up.

For each of the three files you've loaded, find the total number of rows in the csv and the number of unique students. To find the number of unique students in each table, you might want to try creating a set of the account keys.

Again, in case you're not finished with your local setup, you can complete this exercise in the Udacity code editor. You'll need to run the next exercise locally, though, so if you haven't finished setting up, you should do that now.

Start Quiz:

import unicodecsv

def read_csv(filename):
    with open(filename, 'rb') as f:
        reader = unicodecsv.DictReader(f)
        return list(reader)

enrollments = read_csv('/datasets/ud170/udacity-students/enrollments.csv')
daily_engagement = read_csv('/datasets/ud170/udacity-students/daily_engagement.csv')
project_submissions = read_csv('/datasets/ud170/udacity-students/project_submissions.csv')
    
### For each of these three tables, find the number of rows in the table and
### the number of unique students in the table. To find the number of unique
### students, you might want to create a set of the account keys in each table.

enrollment_num_rows = 0             # Replace this with your code
enrollment_num_unique_students = 0  # Replace this with your code

engagement_num_rows = 0             # Replace this with your code
engagement_num_unique_students = 0  # Replace this with your code

submission_num_rows = 0             # Replace this with your code
submission_num_unique_students = 0  # Replace this with your code
Solution:

INSTRUCTOR NOTE:

Solutions

If you want to check our solution for the problem, look at the end of this lesson for Quiz Solutions.